[feat] Support Qwen3_5 Training by KemingWu · Pull Request #143 · EvolvingLMMs-Lab/lmms-engine

KemingWu · 2026-03-09T14:00:35Z

Motivation

Modifications

Commit Message Convention

Please follow our standardized commit message format:

[feat] - New features or functionality
[fix] - Bug fixes
[docs] - Documentation changes only
[style] - Code style changes (formatting, missing semicolons, etc.)
[refactor] - Code refactoring without changing functionality
[perf] - Performance improvements
[test] - Adding or updating tests
[chore] - Maintenance tasks, dependency updates, etc.
[ci] - CI/CD configuration changes

Examples:

[feat] add qwen omni iterable dataset support
[fix] resolve bagel model configuration error
[docs] update training guide with YAML examples

See CONTRIBUTING.md for more details.

CI/CD Checks

Your PR will automatically run the following checks:

Linting: Code formatting with black (line-length=120) and import sorting with isort
Run pre-commit run --all-files locally to verify before pushing

Checklist

Follow commit message convention (see above)
Run pre-commit run --all-files and ensure all checks pass
Format your code with black (line-length=120) and isort
Add unit tests for new functionality
Update documentation as needed, including docstrings or example tutorials
Ensure all CI/CD checks pass

…e into feat/qwen3_5

kcz358 · 2026-03-10T01:31:38Z

This part looks to be unnecessary. Can directly use vision iterable

kcz358 · 2026-03-10T01:32:05Z

Same for this one. Seems no need for overriding the load from json method?

kcz358

Checkpoint the hf and transformers repo, seems like the qwen3.5 uses the exact same logic as qwen3. So I think all the data processing class can use the qwen3 processor and dataset.

kcz358 · 2026-03-10T07:32:04Z

One thing to notice here. Qwen3 5 uses hybrid attention, linear+full. Can we just use the flops function for qwen2?

kcz358

LGTM for me. I think the estimate flops is a bit inaccurate. If can't sure what's the flop function for gated delta net, maybe can leave it empty or wait to see if we can copy from verl etc. :)

* feat(models): add transformers 5.0 compatibility Conditionally import models incompatible with transformers >= 5.0: - dream_dllm, qwen3_dllm, llada_dllm require transformers < 5.0 - llava_onevision1_5 requires transformers < 5.0 - Dynamically update __all__ based on transformers version - Prevents ImportError when using transformers 5.0+ * fix(train): add group_by_length for backward compatibility Add group_by_length parameter to TrainingArguments to maintain compatibility with existing training configurations. * feat(deps): allow transformers >= 4.57.1 Update transformers dependency from exact version to minimum version to support transformers 5.0+ while maintaining backward compatibility. * style: auto-fix lint (black + isort) * refactor(processor): replace additional_special_tokens with all_special_tokens Use all_special_tokens for transformers >= 5.0 compatibility while maintaining backward compatibility with transformers < 5.0. Changes: - Add special_tokens property to all processor classes - Use all_special_tokens if available (transformers >= 5.0) - Fall back to additional_special_tokens (transformers < 5.0) - Add <|im_start|> and <||im_end|> tokens to special_tokens list - Cache special_tokens as instance attribute for performance Affected processors: - AeroDataProcessor (base class) - BaseQwen2_5_DataProcessor (inherits from AeroDataProcessor) - Qwen2VLDataProcessor - Qwen2DataProcessor - LLaVADataProcessor - LLaVAVideoDataProcessor (inherits from LLaVADataProcessor) - NanovlmDataProcessor - Qwen3_VLDataProcessor (inherits from BaseQwen2_5_DataProcessor) * style: auto-fix lint (black + isort) * refactor(processor): unify apply_chat_template usage Use processor.apply_chat_template with tokenize=True consistently across all processors instead of mixing with processor.tokenizer calls. Changes: - aero_processor: use processor.apply_chat_template(tokenize=True)[0] - base_qwen2_5_processor: use processor.apply_chat_template(tokenize=True)[0] - qwen2_vl_processor: use processor.apply_chat_template(tokenize=True) - qwen3_vl_processor: use processor.apply_chat_template(tokenize=True)[0] This ensures all processors return token IDs directly during data preparation, improving consistency and reducing confusion. * feat(models): add common_ops for transformer-agnostic rope index Extract rope index calculation functions into common_ops/rope.py to ensure consistent behavior across transformers versions. Changes: - Add common_ops/rope.py with qwen2_5_vl_rope_index and qwen3_vl_get_rope_index - Update qwen2_5_vl_ops.py to use qwen2_5_vl_rope_index - Update qwen3_vl_ops.py to use qwen3_vl_get_rope_index - Update qwen3_vl_moe_ops.py to use qwen3_vl_get_rope_index This ensures rope index calculations remain stable even when transformers internal implementations change. * fix(utils): add B200/B300 GPU FLOPS support Add NVIDIA B200/B300 GPU FLOPS (2.25e15) to get_device_flops() to fix MFU calculation returning 0 on B200 GPUs. Previously, unknown GPU types returned inf FLOPS, causing MFU to always be 0. * Lint * fix(models): qwen2_5_vl transformers 5.0 compatibility - Fix vision_model variable reference in liger kernel patch - Support nested text_config in lce_forward - Handle rope_scaling/rope_parameters for transformers 5.0+ - Add qwen2_5_vl to FlopsCounter model type mapping * refactor(processor): use DataUtilities.apply_chat_template for transformers 5.0 compatibility - Add apply_chat_template utility method to DataUtilities - Handles dict-like return values (BatchEncoding) with use_key param - Handles nested list wrapping from some processors - Update all processors to use unified method * feat(launch): add filter_training_args for transformers 5.0 compatibility Filter unsupported TrainingArguments parameters by inspecting transformers.TrainingArguments.__init__ signature, avoiding errors from deprecated or removed parameters in newer versions. * fix(models): add parse_visual_output for transformers 5.0 compatibility Visual model methods (get_image_features, get_video_features, visual()) may return tuples OR dataclass objects (BaseModelOutputWithPooling, BaseModelOutputWithDeepstackFeatures) in transformers 5.0+. Add parse_visual_output() to transparently handle both return types. * [feat] Support Qwen3_5 Training (#143) * [feat] Support Qwen3_5 Training * style: auto-fix lint (black + isort) * [feat] Support Qwen3.5 Training * optimize qwen3.5 dataset process logic * optimize qwen3.5 dataset process logic * flop function leave empty --------- Co-authored-by: charlesswu <charlesswu@tencent.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix(processor): remove duplicate special_tokens property in qwen2_vl_processor * fix(models): remove duplicate .to() calls in qwen2_5_omni_liger * fix(models): define input_ids_rmpad in inputs_embeds branch to avoid NameError * refactor(models): extract parse_visual_output to common_ops/visual.py * refactor(processor): extract special_tokens logic to DataUtilities.get_special_tokens * style: auto-fix lint (black + isort) * docs: add Transformers 5.0 migration guide Add comprehensive migration guide for transformers 5.0 compatibility. Includes compatibility matrix, installation instructions, and troubleshooting for Qwen3.5 (requires >= 5.3.0) and legacy models (requires < 5.0.0). --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: wukeming <108406625+KemingWu@users.noreply.github.com> Co-authored-by: charlesswu <charlesswu@tencent.com> Co-authored-by: mwxely <yang0756@e.ntu.edu.sg>

[feat] Support Qwen3_5 Training

86e5010

KemingWu requested review from Luodian and kcz358 March 9, 2026 14:00

github-actions Bot and others added 3 commits March 9, 2026 14:01

style: auto-fix lint (black + isort)

2610c24

[feat] Support Qwen3.5 Training

271d52b

Merge branch 'feat/qwen3_5' of github.com:EvolvingLMMs-Lab/lmms-engin…

cb41dc2

…e into feat/qwen3_5

kcz358 reviewed Mar 10, 2026

View reviewed changes

charlesswu added 2 commits March 10, 2026 11:18

optimize qwen3.5 dataset process logic

0ea7bea

optimize qwen3.5 dataset process logic

0a5930f

KemingWu requested a review from kcz358 March 10, 2026 03:21

kcz358 reviewed Mar 10, 2026

View reviewed changes

kcz358 approved these changes Mar 10, 2026

View reviewed changes

flop function leave empty

0d4f490

KemingWu merged commit b585f4a into transformer_5.0 Mar 10, 2026
3 checks passed

KemingWu deleted the feat/qwen3_5 branch March 10, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Support Qwen3_5 Training#143

[feat] Support Qwen3_5 Training#143
KemingWu merged 7 commits intotransformer_5.0from
feat/qwen3_5

KemingWu commented Mar 9, 2026

Uh oh!

kcz358 Mar 10, 2026

Uh oh!

kcz358 Mar 10, 2026

Uh oh!

kcz358 left a comment

Uh oh!

kcz358 Mar 10, 2026

Uh oh!

kcz358 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KemingWu commented Mar 9, 2026

Motivation

Modifications

Commit Message Convention

CI/CD Checks

Checklist

Uh oh!

kcz358 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment

Choose a reason for hiding this comment

Uh oh!

kcz358 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants